Skip to content

build: Extend tokenizer capabilities#1114

Merged
msluszniak merged 1 commit intomainfrom
@bo/bumpTokenizerCapabilities
May 6, 2026
Merged

build: Extend tokenizer capabilities#1114
msluszniak merged 1 commit intomainfrom
@bo/bumpTokenizerCapabilities

Conversation

@benITo47
Copy link
Copy Markdown
Contributor

@benITo47 benITo47 commented Apr 29, 2026

Description

This PR introduces rebuilt binaries that contain new, updated tokenizers.
This iteration features support for more tokenisation models (i.e. unigram, worldlevel) as well as bunch of previously unsupported pre-tokenisers, decoders, post-processors.

Introduces a breaking change?

  • Yes
  • No

Type of change

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Documentation update (improves or adds clarity to existing documentation)
  • Other (chores, tests, code style improvements etc.)

Tested on

  • iOS
  • Android

Testing instructions

Before merging, test all demo applications. See if all models that proved problematic during bumps in the past are working (i.e. kokoro, multi-method models)
Check all LLM models, see if output is working.

  • LLM app on iOS
  • LLM app on Android
  • Speech app on iOS
  • Speech app on Android
  • Text Embeddings on iOS
  • Text Embeddings on Android

Screenshots

Related issues

Checklist

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated the documentation accordingly
  • My changes generate no new warnings

Additional notes

@msluszniak msluszniak force-pushed the @bo/bumpTokenizerCapabilities branch from 78b5a13 to f1341d2 Compare April 30, 2026 12:14
Copy link
Copy Markdown
Member

@msluszniak msluszniak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, tested all Android demo apps and they worked. Unfortunately, I don't have any iOS with me. We need someone to test iOS as well and then, I think we are ready to ship it.

@chmjkb
Copy link
Copy Markdown
Collaborator

chmjkb commented May 4, 2026

Ok, tested all Android demo apps and they worked. Unfortunately, I don't have any iOS with me. We need someone to test iOS as well and then, I think we are ready to ship it.

ill take a look tomorrow

@chmjkb
Copy link
Copy Markdown
Collaborator

chmjkb commented May 4, 2026

is there any particular tokenizer this should be tested with?

@msluszniak
Copy link
Copy Markdown
Member

msluszniak commented May 4, 2026

is there any particular tokenizer this should be tested with?

Yes, unigram. You can test it by running model from this PR: #1115

Copy link
Copy Markdown
Member

@msluszniak msluszniak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@msluszniak msluszniak merged commit e937c36 into main May 6, 2026
4 checks passed
@msluszniak msluszniak deleted the @bo/bumpTokenizerCapabilities branch May 6, 2026 09:12
msluszniak added a commit that referenced this pull request May 7, 2026
…#1115)

## Description

Adds the `paraphrase-multilingual-MiniLM-L12-v2` sentence-transformer
model — the second multilingual embeddings model after distiluse,
completing #945. Ships **only the XNNPACK 8da4w variant** under
`MODEL_REGISTRY.ALL_MODELS` (see "Why a single variant" below).

384-d output, max 126 tokens, 50+ languages. Tokenizer is Unigram +
Precompiled normalizer + Metaspace decoder — **requires the bumped
`pytorch/extension/llm/tokenizers` runtime from #1114**, so this PR
blocks on that landing first and should be rebased onto main once #1114
merges.

HF repo:
[software-mansion/react-native-executorch-paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/software-mansion/react-native-executorch-paraphrase-multilingual-MiniLM-L12-v2)
(`v0.9.0` tag, layout mirrors distiluse).

**Why a single variant** — 
TLDR 8da4w works faster then all and was also one of the smallest,
without loss in precision.
Longer answer:
unlike distiluse, where Core ML fp32 won iPhone thanks to ANE
acceleration, benchmarks on iPhone 17 Pro + OnePlus 12 (~80-token input,
50 measured forwards after 3 warmups) showed the XNNPACK 8da4w variant
Pareto-dominates the other three on both platforms: faster than XNNPACK
fp32, Core ML fp32 *and* Core ML fp16 on iPhone, and ~36% smaller
steady-state memory footprint than the next-best variant. Likely cause:
paraphrase-multilingual-MiniLM-L12-v2 is a smaller model (~118 M params,
12 layers) where Core ML's runtime doesn't push enough work onto ANE for
the precision-conversion overhead to pay off. fp16 being slower than
fp32 on Core ML for this model is a tell that the runtime is falling
back to slower compute units. Shipping only `_8DA4W` keeps the public
surface aligned with the data; if a future Core ML or model update flips
the verdict, easy to add the other variants back.

**Memory methodology note** — the new paraphrase row in
`docs/docs/02-benchmarks/memory-usage.md` reports RSS / `phys_footprint`
deltas from a clean app baseline (loaded − idle), captured on-device at
the same conceptual point. The existing distiluse rows there (36 / 44
MB) come from an older measurement pass with a different (and not
reconstructable from the diff) methodology, so the two rows are not
directly comparable. A separate pass to re-measure distiluse and other
rows with the same methodology would be a good follow-up.

### Introduces a breaking change?

- [ ] Yes
- [x] No

### Type of change

- [ ] Bug fix (change which fixes an issue)
- [x] New feature (change which adds functionality)
- [ ] Documentation update (improves or adds clarity to existing
documentation)
- [ ] Other (chores, tests, code style improvements etc.)

### Tested on

- [x] iOS
- [x] Android

### Testing instructions

1. `cd apps/text-embeddings && npx expo run:ios` (or `run:android`).
2. Pick **"Multilingual Paraphrase (8da4w)"** in the model picker.
3. Add a sentence in one language, query with an aligned sentence in
another (e.g. Polish "Słoneczko" against "It's so sunny outside!"). The
cross-lingual pair should top the matches.

### Related issues

Closes the paraphrase-multilingual half of #945 (the distiluse half
landed in #1098).

### Checklist

- [x] I have performed a self-review of my code
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] I have updated the documentation accordingly
- [x] My changes generate no new warnings

### Additional notes

Blocks on #1114.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore PRs that are chores

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants